This is a plot of the VP over time, coloured by the type of crude. There’s obvious cyclicality to it, as well one can begin to see how crude type affects VP.
ggplotly(a_timeline) %>%
layout(
xaxis = list(
rangeselector = list(
buttons = list(
list(
count = 3,
label = "3 mo",
step = "month",
stepmode = "backward"),
list(
count = 6,
label = "6 mo",
step = "month",
stepmode = "backward"),
list(
count = 1,
label = "1 yr",
step = "year",
stepmode = "backward"),
list(
count = 1,
label = "YTD",
step = "year",
stepmode = "todate"),
list(step = "all"))),
rangeslider = list(type = "date")
)
)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
VP is plotted as a series of Box Plots, and then is split by crude type and month. Unfortunately at the time of writing, we don’t have a ton of good data (data w/ Density and Sulfur) for May onward, so it is hard to visualize seasonality. That being said, I’m guessing there will be more seasonality in some of the heavier crudes.
b_boxplot_facet
Here we can see how each of the different types of Crude are affected by changes in temperature. You may notice that for the most part they seem to be negatively correlated, with the exception of Medium SW. I’m guessing Medium SW is also negatively correlated, but I’ve only got 3 data points for it, which is not nearly enough to draw any meaningful conclusion.
d_scatter_facet
One can see how the distributions change between crude types. There’s a fair bit of multimodality going on in each of the plots. As well, the spread tends to increase as you move to the heavier crudes.
c_violin
ggplotly(h_VP_kern)
ggplotly(e_VP_Prod)
\[ \text{VP} = 144.1139 + 0.0775x_{1} - 0.2842x_{2} - 0.1066x_{3} - 0.0685x_{4} + 10.7657x_{5} + 9.1914x_{6} + 7.7554x_{7} - 0.1218x_{8} - \] \[ 10.1773x_{9} - 13.4757x_{10} + \epsilon \] \[ \vec{x} = \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \\ x_{7} \\ x_{8} \\ x_{9} \\ x_{10} \end{bmatrix} = \begin{bmatrix} \text{Density} \\ \text{7 Day Temperature Rolling Average} \\ \text{Oxbow} \\ \text{Kipling} \\ \text{C5+} \\ \text{Light LSB} \\ \text{Light SW} \\ \text{Medium LSB} \\ \text{Medium SW} \\ \text{Midale} \end{bmatrix} \] \[ MSE = 105.7478 \implies \sqrt{MSE} = 10.2834\text{kPa} \] \[ MSE_{\text{VP.Rand.Split}} = 64.8653 \implies \sqrt{MSE_{\text{VP.Rand.Split}}} = 8.0539\text{kPa} \]
The models with the “.Rand” added to them have a bit of noise added to the prediction via a normal distribution: \[ \text{VP.Rand} \sim \mathcal{N} \left( 0, \frac{\sigma^{2}}{\sqrt{2}} \right) \] \[ \text{VP.Rand.Split} \sim \mathcal{N} \left( 0, \frac{{\sigma_{i}}^{2}}{\sqrt{2}} \right), i = \{\text{C5+, Light LSB, ... , Heavy SW}\} \]
We are looking for a model that has very little space between the Bootstrapped output, and the model. What I’ve found is that the predictive model on its own is too deterministic, and while the overall shape of it isn’t far off from what we want, it does not do a good job at predicting results in the tails of the distributions. By adding some noise to the output, I was able to improve the fit greatly.
ggplotly(i_kern_plotly)
## Warning: Removed 489 rows containing non-finite values (stat_density).
i_kern_marginal
## Warning: Removed 489 rows containing non-finite values (stat_density).
I’ve checked the Empirical Cumulative Distribution Functions (ECDFs) as well as a means of validating my model:
i_ecdf
## Warning: Removed 489 rows containing non-finite values (stat_ecdf).
i_ecdf_marginal
## Warning: Removed 489 rows containing non-finite values (stat_ecdf).
This provides a good idea at how well the model does at estimating the spread and skewness of the underlying distributions.
j_violin
## Warning: Removed 489 rows containing non-finite values (stat_ydensity).
## Warning: Removed 489 rows containing non-finite values (stat_boxplot).
## Warning: Removed 489 rows containing non-finite values (stat_boxplot).
## Warning: Removed 489 rows containing missing values (geom_point).
j_violin_marginal
## Warning: Removed 489 rows containing non-finite values (stat_ydensity).
## Warning: Removed 489 rows containing non-finite values (stat_boxplot).
## Warning: Removed 489 rows containing non-finite values (stat_boxplot).
## Warning: Removed 489 rows containing missing values (geom_point).